Skip to content

[pull] master from DataDog:master#520

Merged
pull[bot] merged 5 commits into
ConnectionMaster:masterfrom
DataDog:master
May 6, 2026
Merged

[pull] master from DataDog:master#520
pull[bot] merged 5 commits into
ConnectionMaster:masterfrom
DataDog:master

Conversation

@pull

@pull pull Bot commented May 6, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

NouemanKHAL and others added 5 commits May 6, 2026 12:18
* feat(nutanix): add state tags and disk_status

Introduce resource state tags so capacity-planning workflows can filter
out hosts in maintenance, disconnected hosts, powered-off VMs, degraded
disks, and clusters in special operation modes:

- ntnx_maintenance_state, ntnx_connection_state on host.*
- ntnx_power_state on vm.* (was previously only on vm.status)
- ntnx_operation_mode on cluster.*
- ntnx_disk_status on host.storage_* (worst-status aggregation)

Disk status is sourced once per check from the cluster-wide
api/clustermgmt/v4.0/config/disks endpoint and cached by node ID.
_report_stats gains an extra_tags_by_key parameter so disk_status is
scoped to host.storage_* metrics only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(nutanix): cover disk_status and state tag extraction

Add focused unit tests for the new state-tag and disk_status paths:

- test_disk_status: parametrized aggregation matrix (NORMAL, degraded
  states, $UNKNOWN/$REDACTED, mixed, forward-compat enum values),
  cache-build defensiveness (skips disks missing nodeExtId), and an
  end-to-end check that a degraded disk flows ntnx_disk_status:degraded
  onto storage_* metrics. Disks-endpoint failure is asserted to leave
  storage metrics emitted without the tag.
- test_tag_extraction: defensive checks for missing hypervisor block,
  missing config.operationMode, empty-string field values, and verbatim
  PAUSED power-state preservation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(nutanix): record fixtures by resource selectively

Add --resources / -r flag to the fixture recorder accepting a comma-
separated or repeated list of resources to refresh, plus --list to
enumerate available names. When a dependent resource is requested
without its prerequisites (e.g., host_stats without clusters/hosts),
the prerequisites are fetched in memory but their fixtures are not
overwritten.

Adds record_disks() targeting api/clustermgmt/v4.0/config/disks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(nutanix): expose state tags in Capacity Planning

Consolidate the Capacity Planning section into three unified tables —
Clusters, Hosts, VMs — each grouped by their state tags so capacity
planners can filter inline by maintenance/connection/power/operation
state. Add a Storage Capacity by Host table grouped by ntnx_disk_status
to surface degraded storage. The intro note now summarizes the inputs
and the available state tags.

Datadog lowercases tag values at ingestion, so all tag-value filters in
the dashboard use lowercase (e.g., ntnx_connection_state:connected).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(nutanix): tighten disk_status helpers

Minor compression in the disk_status path with no behavior change:

- _aggregate_disk_status: drop the redundant empty-set guard and
  inline the degraded_states constant — set ops on empty sets work,
  the constant was used once.
- _report_stats: collapse the 4-line param docstring to one line.
- _process_single_host: inline the disk_status_extra_tags local;
  it was used exactly once.
- _get_disk_status_storage_tags: fold the storage-keys tuple onto
  one line.

Net -22 lines in infrastructure_monitor.py; all 136 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(nutanix): guard disks API consumers against malformed entries

The new /api/clustermgmt/v4.0/config/disks consumers (_build_disks_by_
host_cache, _aggregate_disk_status) iterate over response items and
call .get(...) on each — if Nutanix ever returns a null entry or a
non-dict, that would raise AttributeError. Add isinstance(d, dict)
filters at both sites and parametrized test coverage for None /
strings / ints mixed into the disk list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(nutanix): emit vm.disk_capacity_bytes from config

The metric was mapped from diskCapacityBytes in VM_STATS_METRICS, but
the Nutanix v4 stats endpoint never returns that field — so the metric
was effectively never emitted (the existing test list parked it under
VM_STATS_METRICS_OPTIONAL, which silently let it slip).

Source it instead from the VM config: sum vm.disks[].backingInfo.disk
SizeBytes. Same approach as memory.allocated_bytes from memorySizeBytes.
This makes the metric reliably available for capacity-planning queries
and the dashboard's per-VM Disk Allocated column now populates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(nutanix): correct host power metric name in optional test list

HOST_STATS_METRICS_OPTIONAL had nutanix.host.power.consumption.instant_watt
(dots) but the canonical metric — emitted by the integration and matching
the cluster equivalent — is nutanix.host.power_consumption_instant_watt
(underscores). The typo made the OPTIONAL entry dead: when the metric ever
lands in a power-meter-equipped environment, assert_all_metrics_covered()
would fail because the OPTIONAL bucket entry didn't actually cover it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(nutanix): add per-metric state-tag coverage guards

Six tests that introspect the aggregator after a check run and assert
every new state tag lands on every applicable metric:

- ntnx_maintenance_state on every host metric for hosts whose source
  data has the field
- ntnx_connection_state on every host metric (both fixture hosts have it)
- ntnx_power_state on every VM metric
- ntnx_operation_mode on every cluster metric
- ntnx_disk_status on every host.storage_* metric
- ntnx_disk_status NOT present on non-storage host metrics (scoping guard)

If someone removes a tag emission from one of the _extract_*_tags methods,
the failure message names every metric that lost the tag rather than
failing at the first bundled assertion in test_hosts/test_vms/test_clusters
and stopping there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix changelog pr number

* fix(nutanix): address review feedback for state tags

Bundle of changes driven by the PR review:

- Lowercase state-tag values (ntnx_connection_state, ntnx_operation_mode,
  ntnx_power_state) at emission so the code matches Datadog's
  ingestion-time normalization explicitly rather than relying on it
  implicitly. ntnx_maintenance_state was already lowercase from the API.
- Always emit ntnx_power_state, falling back to "unknown" when the
  source field is missing/empty. Previously the tag silently dropped
  for those VMs, breaking dashboards/monitors that group by power_state.
- Hoist DEGRADED_DISK_STATUSES and HOST_STORAGE_STAT_KEYS to
  datadog_checks/nutanix/metrics.py as module-level frozensets, and
  derive HOST_STORAGE_METRICS in tests/metrics.py from them — eliminates
  the three independent enumerations that could drift if a fifth
  storage stat is added.
- Log the disk count alongside the host bucket count when caching
  ("Cached %d disks across %d hosts") for easier triage.
- Move the InfrastructureMonitor stub fixture used by tag-logic unit
  tests into tests/conftest.py with the union of attributes both files
  need; remove the divergent local fixtures.
- Add a parametrized unit test for _extract_vm_disk_capacity_bytes
  covering missing disks, missing backingInfo/diskSizeBytes, and
  non-dict entries — previously only happy-path e2e coverage existed.

156 unit tests pass; 1 dropped review finding (functional reviewer's
"silent failure" claim about the dashboard) was correctly contested
by the integrations reviewer — Datadog auto-lowercases tag values at
ingestion. Lowercasing at emission is purely a clarity improvement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(nutanix): guard state-tag .lower() against non-string values

The previous commit introduced .lower() on the four state tags
(maintenance, connection, operation_mode, power) to align emission
with Datadog's lowercased ingestion. If the Nutanix API ever returns
a non-string value (e.g. an int, bool, or list — implausible per the
v4 schema, but possible if the API misbehaves), .lower() would raise
AttributeError. The previous walrus-only pattern would have just
formatted the unusual value via __str__ — no crash.

Add a tiny module-level _norm_state(value) helper:
- Returns value.lower() when value is a non-empty str.
- Returns None for missing, empty, or non-string values.

Threading it through the four sites preserves the walrus :=
pattern and keeps each emission a single line. For VM powerState
the fallback to "unknown" is unchanged.

Adds parametrized regression tests covering int/bool/list inputs
on host, cluster, and VM tag extraction. 162 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(nutanix): rewrite changelog entries for customer audience

- Drop 23578.fixed.1: the always-emit ntnx_power_state behavior is
  part of the tag's first shipped contract (added in this same PR);
  not a fix to a previously-released feature.
- Rewrite 23578.added to lead with the user benefit (capacity-planning
  filtering) and structure the tags as bullets.
- Trim 23578.fixed of internal field names; surface only what users
  observe ("the metric now reports values where it didn't before").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(nutanix): split vm.disk_capacity_bytes fix into separate PR

Move the vm.disk_capacity_bytes fix out of this PR; it ships separately
in PR #23583. This PR is now state-tags-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(nutanix): drop private-method tests in favor of check-level coverage

Remove test_tag_extraction.py and test_state_tag_coverage.py, and trim
test_disk_status.py to only the integration-style tests that exercise
the check end-to-end. Coverage of the new state tags is preserved
through HOST_TAGS/CLUSTER_TAGS/PCVM_TAGS in the existing per-entity
tests. Also drop the now-unused monitor fixture from conftest, simplify
_norm_state, and remove a redundant defensive check in
_aggregate_disk_status.

* docs(nutanix): rewrite state-tags changelog for users

* docs(nutanix): collapse state-tags changelog into one line

* docs(nutanix): frame state-tags changelog as a new feature

* fix(nutanix): always emit state tags with unknown fallback

Make ntnx_maintenance_state, ntnx_connection_state, ntnx_operation_mode,
and ntnx_disk_status emit consistently like ntnx_power_state already
does — always present, falling back to "unknown" when the source field
is missing. Also normalize the spec-defined sentinel values (\$UNKNOWN,
\$REDACTED, UNDETERMINED) to "unknown" so they don't surface as ugly
tag values like ntnx_operation_mode:\$unknown.

Without this, dashboards and monitors filtering on these tags silently
drop entities whose source field is missing — the same failure mode
the power-state fallback was originally added to prevent.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Isolate ddev release-notes extraction in an unprivileged job

The publish job holds contents: write and id-token: write. Installing
ddev there pulled the full transitive dependency graph into the same
workspace as the release artifacts and credentials, giving any
compromised dependency a path to tamper with archives/installers
before upload (AI-6799).

Move the install + ddev release changelog show into a new
extract-release-notes job with permissions: contents: read. The
publish job downloads release-notes.md by exact artifact name and
no longer runs any pip install. Extraction stays best-effort: on
failure we still upload an empty file so the release proceeds with
an empty body rather than blocking after PyPI has already published.

* Drop unused define-tags dependency from extract-release-notes

* Make release-notes extraction always succeed and always upload

Previously only the ddev release changelog show step handled failure
gracefully. Setup, install, and ddev config could still fail and skip
the artifact upload, which would in turn cause publish to be skipped
because of the needs dependency, blocking PyPI and the GitHub release.

Pre-create an empty release-notes.md, mark the install/configure/extract
steps continue-on-error: true, and run the upload with if: always(). The
job's conclusion is now success regardless of transient setup or install
failures, and an artifact (possibly empty) is always available for the
publish job.
* Add n8n default

* Add changelog

* Apply suggestion from @sarah-witt

* validate

---------

Co-authored-by: Juanpe Araque <juanpedro.araque@datadoghq.com>
* Fill in metadata.csv descriptions

Populate empty descriptions for datadog.agent.python.version,
datadog.agent.running, and datadog.agent.started, sourced from
https://docs.datadoghq.com/getting_started/agent/#agent-metrics

* Refine metadata.csv descriptions

* Address review: natural language tags and uniform quoting
* Handle promotion failure on forks

* use pull_request_target
@pull pull Bot locked and limited conversation to collaborators May 6, 2026
@pull pull Bot added the ⤵️ pull label May 6, 2026
@pull pull Bot merged commit cd0a6d0 into ConnectionMaster:master May 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants